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Abstract 

In  this  paper,  an  adaptive  nonlinear  estimator  is  de¬ 
veloped  to  identify  the  Euclidean  coordinates  of  feature 
points  on  a  moving  object  using  a  single  fixed  camera. 
No  explicit  model  is  used  to  describe  the  movement  of 
the  object.  Homography-based  techniques  are  used  in 
the  development  of  the  object  kinematics,  while  Lya¬ 
punov  design  methods  are  utilized  in  the  synthesis  of 
the  adaptive  estimator.  Simidation  results  are  included 
to  demonstrate  the  performance  of  the  estimator. 

1  Introduction 

The  recovery  of  Euclidean  coordinates  of  feature  points  on 
a  moving  object  from  a  sequence  of  images  is  a  mainstream 
research  problem  with  significant  potential  impact  for  appli¬ 
cations  such  as  autonomous  vehicle/robotic  guidance,  nav¬ 
igation,  path  planning  and  control.  It  bears  a  close  resem¬ 
blance  to  the  classical  problem  in  computer  vision,  known  as 
“Structure  from  Motion  (SFM)”,  which  is  the  determination 
of  3D  structure  of  a  scene  from  its  2D  projections  on  a  mov¬ 
ing  camera.  Although  the  problem  is  inherently  nonlinear, 
typical  SFM  results  are  based  on  linearization  based  meth¬ 
ods  such  as  extended  Kalman  filtering  [1,  6,  19).  In  recent 
publications,  some  researchers  have  recast  the  problem  as 
state  estimation  of  a  continuous-time  perspective  dynamic 
system,  and  have  employed  nonlinear  system  analysis  tools 
in  the  development  of  state  observers  that  identify  motion 
and  structure  parameters  [13,  14).  To  summarize,  these  pa¬ 
pers  show  that  if  the  velocity  of  the  moving  object  (or  cam¬ 
era)  is  known,  and  satisfy  certain  obervability  conditions,  an 
estimator  for  the  unknown  Euclidean  position  of  the  feature 
points  can  be  developed.  In  [4],  an  observer  for  the  estima¬ 
tion  of  camera  motion  was  presented  based  on  perspective 
observations  of  a  single  feature  point  from  the  (single)  mov¬ 
ing  camera.  The  observer  development  was  based  on  sliding 
mode  and  adaptive  control  techniques,  and  it  was  shown 
that  upon  satisfaction  of  a  persistent  excitation  condition 
[21],  the  rotational  velocity  could  be  fully  recovered,  and 
the  translational  velocity  could  be  recovered  upto  a  scale 

1This  work  was  supported  in  part  by  two  DOC  Grants,  an 
ARO  Automotive  Center  Grant,  a  DOE  Contract,  a  Honda  Cor¬ 
poration  Grant,  and  a  DARPA  Contract. 


factor.  The  depth  ambiguity  attributed  to  the  unknown 
scale  factor  was  resolved  by  resorting  to  stereo  vision.  The 
afore-mentioned  approach  requires  that  a  model  for  object 
motion  be  known. 

In  this  paper,  we  present  a  unique  nonlinear  estimation 
strategy  to  simultaneously  estimate  the  velocity  and  struc¬ 
ture  of  a  moving  object  using  a  single  camera.  Roughly 
speaking,  satisfaction  of  a  persistent  excitation  condition 
(similar  to  [4]  and  others)  allows  the  determination  of  the 
inertial  coordinates  for  all  the  feature  points  on  the  object. 
A  homography-based  approach  is  utilized  to  develop  the  ob¬ 
ject  kinematics  in  terms  of  reconstructed  Euclidean  infor¬ 
mation  and  image-space  information  for  the  fixed  camera 
system.  The  development  of  object  kinematics  relies  on  the 
work  presented  in  [2]  and  [17],  and  requires  a  priori  knowl¬ 
edge  of  a  single  geometric  length  between  two  feature  points 
on  the  object.  A  novel  nonlinear  integral  feedback  estima¬ 
tion  method  developed  in  our  previous  efforts  [5]  is  then 
employed  to  identify  the  linear  and  angular  velocity  of  the 
moving  object.  Identifying  the  velocities  of  the  object  fa¬ 
cilitates  the  development  of  a  measurable  error  system  that 
can  be  used  to  formulate  a  nonlinear  least  squares  adaptive 
update  law.  A  Lyapunov-based  analysis  is  then  presented 
that  indicates  if  a  persistent  excitation  condition  is  satisfied 
then  the  time-varying  Euclidean  coordinates  of  each  feature 
point  can  be  determined. 

While  the  problem  of  estimating  the  motion  and  Euclidean 
position  of  features  on  a  moving  object  is  addressed  in  this 
paper  by  using  a  fixed  camera  system,  the  development  can 
also  be  recast  for  the  camera-in-hand  problem  where  a  mov¬ 
ing  camera  observes  stationary  objects.  That  is,  by  recast¬ 
ing  the  problem  for  the  camera-in-hand,  the  development 
in  this  paper  can  also  be  used  to  address  the  Simultane¬ 
ous  Localization  and  Mapping  (SLAM)  problem  [8],  where 
the  information  gathered  from  a  moving  camera  is  utilized 
to  estimate  both  the  motion  of  the  camera  (and  hence,  the 
relative  position  of  the  vehicle/robot)  as  well  as  position  of 
static  features  in  the  environment. 

2  Geometric  Model 

In  order  to  develop  a  geometric  relationship  between  the 
fixed  camera  and  the  moving  object,  we  define  an  orthogonal 
coordinate  frame,  denoted  by  T ,  attached  to  the  object  and 


Figure  1:  Geometric  relationships  between  a  fixed  cam¬ 
era  and  the  current  and  reference  positions  of  a 
moving  object  in  its  field  of  view. 

an  inertial  coordinate  frame,  denoted  by  X,  whose  origin 
coincides  with  the  optical  center  of  the  fixed  camera  (see 
Figure  1).  Let  the  3D  coordinates  of  the  ith  feature  point 
on  the  object  be  denoted  as  the  constant  Si  £  R3  relative  to 
the  object  reference  frame  J-,  and  m;(t)  £  R3  relative  to  the 
inertial  coordinate  system  X,  such  that 


It  is  evident  from  (5)  that  R(t)  and  Xf{t)  quantify  the  ro¬ 
tation  and  translation,  respectively,  between  the  frames  T 
and  X*.  As  also  illustrated  in  Figure  1,  n*  £  R3  denotes 
the  constant  normal  to  the  plane  7r*  expressed  in  the  coor¬ 
dinates  of  X,  and  the  constant  projections  of  fh*  along  the 
unit  normal  n*,  denoted  by  d*  £  R  are  given  by 

d *  =  n*T  fh*.  (6) 

Using  (6),  it  can  be  easily  seen  that  the  relationship  in  equa¬ 
tion  (4)  can  now  be  expressed  as  follows 

(  fj,  .  ^  f  *t\  -  « 

rm=  f?  +  —  n 

S  ^  i  (7) 

H 

where  H{t)  £  R3x3  denotes  a  Euclidean  homography  [11]. 

Since  a  video  camera  is  our  sensing  device,  we  must  develop 
a  geometric  relationship  between  the  3D  world  in  which  the 
moving  object  resides  and  its  2D  projection  in  the  image 
plane  of  the  camera.  To  this  end,  we  define  normalized 
Euclidean  coordinates,  denoted  by  mi(t),m*  £  R3  for  the 
feature  points  as  follows 


As  seen  by  the  camera,  each  of  these  feature  points  have 
projected  pixel  coordinates,  denoted  by  pi(t),p*  £  R3,  ex¬ 
pressed  relative  to  X  as  follows 

Pi=  [  Ui  Vi  1  ] T  P*  =  [  u*  v*  1  ] T  •  (9) 


mi  =  [  Xi  yt  Zi  ]T  .  (1) 

It  is  assumed  that  the  object  is  always  in  the  field  of  view 
of  the  camera,  and  hence  the  distances  from  the  origin  of 
X  to  all  the  feature  points  remain  positive  (i.e. ,  Zi(t)  >  e, 
where  e  is  an  arbitrarily  small  positive  constant).  To  relate 
the  coordinate  systems,  let  R(t)  £  50(3)  and  Xf(t)  £  R3 
denote  the  rotation  and  translation,  respectively,  between 
T  and  X.  Also,  let  three  of  the  non-collinear  feature  points 
on  the  object,  denoted  by  Oi  Mi  =  1,  2,  3,  define  the  plane 
7r  shown  in  Figure  1.  Now  consider  the  object  to  be  at 
some  fixed  reference  position  and  orientation,  denoted  by 
T* ,  as  defined  by  a  reference  image  of  the  object.  We  can 
similarly  define  the  constant  terms  fh*,  R*  and  Xf,  and  the 
plane  7r*  for  the  object  at  the  reference  position.  From  the 
geometry  between  the  coordinate  frames  depicted  in  Figure 
1,  the  following  relationships  can  be  developed 

fhi  =  Xf  +  Rsi  (2) 

fh*  =  x*f  +  R*  Si  .  (3) 

After  solving  (3)  for  Si  and  then  substituting  the  resulting 
expression  into  (2),  we  have 

fhi  =  Xf  +  Rfh*  (4) 

where  R  ( t )  £  SO  (3)  and  Xf  (t)  £  R3  are  new  rotational  and 
translational  variables,  respectively,  defined  as  follows 

R  =  R(R*)t  Xf=Xf  —  Rx*f.  (5) 


The  projected  pixel  coordinates  of  the  feature  points  are 
related  to  the  normalized  Euclidean  coordinates  by  the  pin¬ 
hole  model  of  [10]  such  that 

Pi  =  Arm  P*  =  Am*  (10) 

where  A  £  R3x3  is  a  known,  constant,  upper  triangular  and 
invertible  intrinsic  camera  calibration  matrix  [18].  From  (7), 
(8)  and  (10),  the  relationship  between  image  coordinates 
of  the  corresponding  feature  points  in  T  and  T*  can  be 
expressed  as  follows 


Pi  =  —  A(R  +  xhi(n*)  )A  p* 

Z%  \  / 


Qt-i 


G 


(11) 


where  ai  £  R  denotes  the  depth  ratio,  and  Xhi{t) 


Xf{t) 

d* 


£ 


R3  denotes  the  scaled  translation  vector.  The  matrix  G(t)  £ 
R3x3  defined  in  (11)  is  a  full  rank  homogeneous  collineation 
matrix  defined  upto  a  scale  factor  [18].  If  the  structure  of 
the  moving  object  is  planar,  all  feature  points  lie  on  the 
same  plane,  and  hence  the  distances  d*  defined  in  (6)  is 
the  same  for  all  feature  points,  henceforth  denoted  as  d* . 
In  this  case,  the  collineation  G(t)  is  defined  upto  the  same 
scale  factor,  and  hence,  one  of  its  elements  can  be  set  to 
unity  without  loss  of  generality.  G(t)  can  then  be  estimated 
from  a  set  of  linear  equations  (11)  obtained  from  at  least 
four  corresponding  feature  points  that  are  coplanar  but  non- 
collinear.  If  the  structure  of  the  object  is  not  planar,  the 


Virtual  Parallax  method  described  in  [18]  could  be  utilized. 
An  overview  of  the  determination  of  the  collineation  matrix 
Gft)  and  the  depth  ratios  a;  ft)  using  both  the  methods  are 
given  in  Appendix  A.  Based  on  the  fact  that  the  intrinsic 
camera  calibration  A  is  known  apriori,  we  can  then  deter¬ 
mine  the  Euclidean  homography  Lift).  By  utilizing  various 
techniques  (see  algorithms  in  [11,  23]),  Lift)  can  be  decom¬ 
posed  into  its  constituent  rotation  matrix  R(t),  unit  normal 

X  »  (~fj\ 

vector  n* ,  scaled  translation  vector  Xhft)  =  1  ;  and  the 

depth  ratio  on  ft).  It  is  assumed  that  the  constant  rotation 
matrix  R*  is  known.  Rft)  can  therefore  be  computed  from 
(5).  Hence  Rft),  Rft),Xh(t)  and  an(i)  are  known  signals  that 
can  be  used  in  the  subsequent  analysis. 


Remark  1  The  subsequent  development  requires  that  the 
constant  rotation  matrix  R*  be  known. 


3  Object  Kinematics 

To  quantify  the  translation  of  T  relative  to  the  fixed  coordi¬ 
nate  system  F* ,  we  define  ev(t)  £  R3  in  terms  of  the  image 
coordinates  of  the  feature  point  Oi  as  follows 

e„  =  [  Mi  —  Mi  vi  —  Vi  —  ln(ai)  ]  T  .  (12) 

In  (12)  and  in  the  subsequent  development,  any  point  Oi  on 
7r  could  have  been  utilized;  however,  to  reduce  the  notational 
complexity,  we  have  elected  to  select  the  feature  point  0\. 
The  signal  ev  ( t )  is  measurable  since  the  first  two  elements  of 
the  vector  are  obtained  from  the  images  and  the  last  element 
is  available  from  known  signals  as  discussed  in  the  previous 
section.  Following  the  development  in  [5],  the  translational 
kinematics  can  be  obtained  as  follows 

ev  =  ^AelR  [ve  -  [si]x  we]  (13) 

zi 


In  (16),  the  Jacobian-like  matrix  L^ft)  £  R3x3  is  defined  as 


/ 


Lw  —  h  —  2  M  x  + 


1  - 


sine  (0) 


\ 


V 


sine*  — 


M> 


(17) 


2  ) 


where  [m]x  denotes  the  3x3  skew-symmetric  form  of  uft), 
I3  £  R3x3  is  the  3x3  identity  matrix,  and 

sin  0  ft) 


sine  (0  (f))  = 


0(f) 


From  (13)  and  (16),  the  kinematics  of  the  object  under  mo¬ 
tion  can  be  expressed  as 

e  =  Jv  (18) 

where  e(t)  —  [  ej  ]T  £  R6,  v(t)  =  [  vf  off  ]T  £ 
R6,  and  J(t)  £  R6x6  is  a  Jacobian-like  matrix  defined  as 


J  = 


r  ^AelR  -?±AelR{s1}> 

zi  zi 

O3  L^R 


(19) 


where  O3  £  R3x3  denotes  a  zero  matrix. 


Remark  2  In  the  subsequent  analysis,  it  is  assumed  that  a 
single  geometric  length  si  £  R3  between  two  feature  points 
is  known.  With  this  assumption,  each  element  of  J(t)  is 
known  with  the  possible  exception  of  the  constant  z\  £  R. 
The  reader  is  referred  to  [5]  where  it  is  shown  that  z*  can 
also  be  computed  given  si. 


Remark  3  It  is  assumed  that  the  object  never  leaves  the 
field  of  view  of  the  camera;  hence,  from  (12)  and  (15),  eft)  £ 
Loo  -It  is  also  assumed  that  the  object  velocity,  acceleration 
and  jerk  are  bounded,  i.e.,  v(t),vft),vft)  £  Loo;  hence  the 
structure  of  (18)  allows  us  to  show  that  eft), eft),  e  (t)  £ 


where  the  notation  [si]  denotes  the  3x3  skew  symmetric 
form  of  si,  ve(t),  uieft)  £  R3  denote  the  unknown  linear 
and  angular  velocity  of  the  object  expressed  in  the  local 
coordinate  frame  IF,  respectively,  and  Ae;  (f)  £  R3x3  is  a 
function  of  the  camera  intrinsic  calibration  parameters  and 
image  coordinates  of  the  ith  feature  point  as  shown  below 


Aei  =  A- 


0  0  Mi 

0  0  Mi 

0  0  0 


(14) 


Similarly,  to  quantify  the  rotation  of  T  relative  to  T* ,  we 
define  ew(t)  £  R3  using  the  axis-angle  representation  [22]  as 
follows 

ew  =  M0  (15) 

where  uft )  £  R3  represents  a  unit  rotation  axis,  and  0(f)  £  R 
denotes  the  rotation  angle  about  u(t)  that  is  assumed  to  be 
confined  to  the  region  — 7r  <  0(f)  <  71.  After  taking  the  time 
derivative  of  (15),  the  following  expression  can  be  obtained 
(see  [5]  for  further  details) 


4  Identification  of  Velocity 


In  [5],  an  estimator  was  developed  for  online  asymptotic 
identification  of  the  signal  eft).  Designating  e(f)  as  the  es¬ 
timate  for  eft),  the  estimator  was  designed  as  follows 

e  =  f  (I<  +  Ie)e(r)dT  + 

Jt0 

+{K  +  h)e(t)  (20) 

where  eft)  —  eft)  —  eft)  £  R6  is  the  estimation  error  for  the 
signal  eft),  K,p  £  R6x6are  positive  definite  constant  diag¬ 
onal  gain  matrices,  I&  £  R6x6  is  the  6x6  identity  matrix, 
to  is  the  initial  time,  and  sgn(e(t))  denotes  the  standard 
signum  function  applied  to  each  element  of  the  vector  eft). 
The  reader  is  referred  to  [5]  and  the  references  therein  for 
analysis  pertaining  to  the  development  of  the  above  estima¬ 
tor.  In  essense,  it  was  shown  in  [5]  that  the  above  estimator 
asymptotically  identifies  the  signal  eft)  provided  the  follow¬ 
ing  inequality  is  satisifed  for  each  diagonal  element  pi  of  the 
gain  matrix  p, 

Pi  >  |e»|  +  |  e'i 


e^j  —  L^Ruje  . 


(16) 


Vi  =  1,2,  ...6. 


(21) 


Hence,  ii  ( t )  —>  efit)  as  t  — >  oo,V*  =  1,2,  ...6.  Since  J(t)  is 
known  and  invertible,  the  six  degree-of-freedom  velocity  of 
the  moving  object  can  be  identified  as  follows 

v(t)  =  J_1(f)  e  ( t ),  and  hence  v{t)  —¥  v(t)  as  t  — >  oo.  (22) 

5  Euclidean  Reconstruction  of  Feature  Points 

The  central  theme  of  this  paper  is  the  identification  of 
Euclidean  coordinates  of  the  feature  points  on  a  moving  ob¬ 
ject  (be.,  the  vector  s;  relative  to  the  object  frame  T,  fhi{t) 
and  fh*  relative  to  the  camera  frame  X  for  all  i  feature  points 
on  the  object).  To  facilitate  the  development  of  the  estima¬ 
tor,  we  first  define  the  extended  image  coordinates,  denoted 
by  pei(t)  £  R3,  for  any  feature  point  Oi  as  follows 

Pei  =  [  Ui  Vi  —  ln(ai)  ] T  .  (23) 

Following  the  development  of  translational  kinematics  in 
(13),  it  can  be  shown  that  the  time  derivative  of  (23)  is 
given  by 

Pei  =  ^AeiR  [ve  +  [we]x  Si] 

Zi 

=  WiVvw6i  (24) 

where  Wfi.)  £  R 3x3,Vvw(t)  £  R3x4and  9i  £  R4  are  defined 
as  follows 


Wi  =  anAeiR  (25) 

Vvw  =  [  ve  [tue]x  ]  (26) 


The  elements  of  W,(.)  are  known  and  bounded,  and  an  es¬ 
timate  of  Vvw(t),  denoted  by  Vvw(t),  is  available  by  appro¬ 
priately  re-ordering  the  vector  v(t)  given  in  (22). 

Our  objective  is  to  identify  the  unknown  constant  9i  in  (24). 
To  facilitate  this  objective,  we  define  a  parameter  estimation 
error,  denoted  by  9  fit)  £  R4,  as  follows 

9i(t)  =  9i  -  9i(t)  (28) 

where  9i(t)  £  R4  is  a  subsequently  designed  parameter  up¬ 
date  signal.  We  also  introduce  a  measurable  filter  signal 
Wffit)  £  R3x4,  and  a  non- measurable  filter  signal  vfit)  £  R3 
defined  as  follows 

Wfi  =  -piWfi  +  WiV^  (29) 

Vi  =  -PiVi  +  WiVvw0i  (30) 

where  (5i  £  R  is  a  scalar  positive  gain,  and  Vvw(t)  =  Vvw(t)  — 
Vvw(t)  £  R3x4  is  an  estimation  error  signal. 

Motivated  by  the  subsequent  stability  analysis,  we  design 
the  following  estimate,  denoted  by  pei  (■ t )  £  R3,  for  the  ex¬ 
tended  image  coordinates, 

Pei=  PiPei  +  Wfi  9i  AWiVvw9 i  (31) 

where  pei(t)  =  pei(t)  —  pei{t)  £  R3  denotes  the  measurable 
estimation  error  signal  for  the  extended  image  coordinates 


of  the  feature  points.  The  time  derivative  of  this  estimation 
error  signal  is  computed  from  (24)  and  (31)  as  follows 

Pei=  —fiiPei  —  Wfi  9i  +WiVvw9i  +  WiVvw9i.  (32) 

From  (30)  and  (32),  it  can  be  shown  that 

Pei  =  Wfi9i  +  Vi-  (33) 

Based  on  the  subsequent  analysis,  we  select  the  following 
least-squares  update  law  [21]  for  9i(t) 

9i=  LiWjiPei  (34) 

where  Li(t)  £  R4x4is  an  estimation  gain  that  is  recursively 
computed  as  follows 

jt{Li1)=WjiWfi.  (35) 

Remark  4  In  the  subsequent  analysis,  it  is  required  that 
L~ 1  (0)  in  (35)  be  positive  definite.  This  requirement  can  be 
easily  satisfied  by  selecting  the  appropriate  non-zero  initial 
values. 

Remark  5  In  the  analysis  provided  in  [5],  it  was  shown  that 
a  filter  signal  r(t)  £  R6  defined  as  r(t)  =  e(t)+  e  ( t )  £  L x 
PlL2.  From  this  result  it  is  easy  to  show  that  the  signals  e(t), 
e  ( t )  £  Z/2  [7].  Since  J(t )  £  Loo  and  invertible,  it  follows  that 
J_1(t)  e  (t)  £  L2.  Hence  v(t)  =  v(t)  —  v(t)  £  L2,  and  it  is 
easy  to  show  that  t4«>(i)  £  Li,  where  the  notation  ||.||oo 

II  OO 

denotes  the  induced  oo-norm  of  a  matrix  [15]. 

5.1  Analysis 

Theorem  1  The  update  law  defined  in  (34)  ensures  that 
9i(t)  — >  0  as  t  — >  oo  provided  that  the  following  persistent 
excitation  condition  [21]  holds 

ptQ+T 

Tih  <  /  W?i{T)Wfi(T)dT  <  72/4  (36) 

Jt0 

and  provided  that  the  gains  pt  satisfy  the  following  inequality 

Pi  >  ku  +  k2i\\Wi\\200  (37) 

where  lo,7i,72,L,  fcii,fc2;  £  R  are  positive  constants,  h  £ 
R4x4  is  the  4x4  identity  matrix,  the  notation  H-H^  denotes 
the  induced  oo-norm  of  a  matrix  [15]  and  ku  must  be  selected 
such  that 

ku  >  2.  (38) 

Proof:  Let  V(t)  £  R  denote  a  non-negative  scalar  function 
defined  as  follows 

V±±9jLr%  +  ±r,J  w  (39) 


After  taking  the  time  derivative  of  (39),  the  following  ex¬ 
pression  can  be  obtained 


V  = 


< 


-\\\wfih\\2  -of wfin,- PM2 

WiVvwOi 

~\\Wfihf  -  Mvif 
+  INI  II^IL  IlftJ  \M 

II  1 1  oo 

+  \Wfihl  Ihill  -  ku  II riiW2  +  ku  \\riiW2 

+k2i\\Wi\\l\\r,i\\2-k2i\\Wi\\l\\r,if  (40) 


After  utilizing  the  nonlinear  damping  argument  [16],  we  can 
simplify  (40)  further  as  follows 


*  5  -G-£)iMf 

-(/3i-fcii-feM||Wi||^)|H|2 

+TL||^||2||t4w||2  (41) 

rZ2i  II  I  loo 

where  ku,  k2i  £  R  are  positive  constants  as  previously  men¬ 
tioned.  The  gains  ku,  k2i,  and  /3i  must  be  selected  to  ensure 


that 

1  1 

(42) 

2-fe^  ^  ^>0 

ft  -  ku  -  k2i  WWfi^  >  p2i  >  0 

(43) 

where  Hu>khi  £  ®  are  positive  constants.  The  gain  con¬ 
ditions  given  by  (42)  and  (43)  allow  us  to  formulate  the 
conditions  given  by  (37)  and  (38),  as  well  as  allowing  us  to 
further  upper  bound  the  time  derivative  of  (39)  as  follows 

V  —  —  Mli  ||W/t0i||  -M2ilkll  +7  Iftll  ||^uu'||  ■  (^4) 

II  II  K2i  II  I  loo 


From  the  discussion  given  in  Remark  5,  we  can  see  that  the 
last  term  in  (44)  is  Li,  hence, 

[  y—  ||#i(t)||2  |[k,u,(t)||  dr<£  (45) 

Jo  i  II  lloo 

where  £  £  R  is  a  positive  constant.  From  (39),  (44)  and  (45), 
we  can  conclude  that 

Jo  (»ii  ||w>i(T)0i(T)||  +  02i  \\Vi(r)\\2^j  dr 

<  V(Q)-V(oo)+e.  (46) 

It  can  be  concluded  from  (46)  that  Wffit)9fit),  pfit)  £  L2. 
From  (46)  and  the  fact  that  V(t)  is  non-negative,  it  can 
be  concluded  that  V(t)  <  V(0)  +  e  for  any  t,  and  hence 
V(t)  £  Loo.  Therefore,  from  (39),  rjfit)  £  Loo  and 
ft  ( t)L~1(t)di(t )  £  Loo-  Since  L“x( 0)  is  positive  definite,  and 
the  persistent  excitation  condition  in  (36)  is  assumed  to  be 
satisfied,  we  can  use  (35)  to  show  that  Lft(t)  is  always  pos¬ 
itive  definite;  hence,  it  must  follow  that  9  fit)  £  Loo-  Since 
v(t)  £  Loo  as  shown  in  [5],  it  follows  from  (26)  that  Vvw(t)  £ 
Loo-  Hence  from  (29),  and  the  fact  that  Wfi.)  defined  in 
(25)  are  composed  of  bounded  terms,  Wffit),  Wffit)  £  Loo 


[7],  and  consequently,  W ffit)9fit)  £  Lx.  Therefore,  from 
(33),  we  can  see  that  pefit)  £  Loo-  It  follows  from  (34) 

that  di  ( t )  £  Loo,  and  hence  ft  ( t )  £  Loo-  From  the 
fact  that  Wffit), 6i  ( t )  £  Loo,  it  is  easy  to  show  that 
it  ( Wfi(t)6i(t £  Loo-  Hence,  W ffit)8fit)  is  uniformly  con¬ 
tinuous  [9[.  Since  we  also  have  that  W ffit)9fit)  £  L2,  we  can 
conclude  that  [9] 


W ffit)9fit)  — >  0  as  t  — >  oo.  (47) 

As  shown  in  Appendix  C,  if  the  signal  W/;(t)  satisifies  the 
persistent  excitation  condition  [21]  given  in  (36),  then  it  can 
be  concluded  from  (47)  that 

ft(t)  — ¥  0  as  t  — >  oo.  (48) 


□ 


Remark  6  It  can  be  shown  that  the  output  Wfi(t)  of  the 
filter  defined  in  (29)  is  persistently  exciting  if  the  input 
Wfit)Vjw(t )  to  the  filter  is  persistently  exciting  [20].  Hence, 
the  condition  in  (36)  is  satisfied  if 

l3h<  ft  °+T  Vvwfir)WT (t)W7(t)VU(t)  dr  <  7474 

U  S  y  ✓ 

w 

(49) 

where  73,74  £  Rare  positive  constants.  It  can  be  shown 
upon  expansion  of  the  integrand  W(t)  £  R4x4  of  (49)  that 
even  if  only  one  of  the  components  of  translational  velocity 
is  non-zero,  the  first  element  of9i(t),  i.e.  -hr,  will  converge 
to  the  correct  value.  It  should  be  noted  that  the  translational 
velocity  of  the  object  has  no  bearing  on  the  convergence  of 
the  remaining  three  elements  of  9  fit),  and  unfortunately,  it 
seems  that  no  inference  can  be  made  about  the  relationship 
between  convergence  of  the  three  remaining  elements  of  9  fit) 
and  the  rotational  velocity  of  the  object. 


Remark  7  As  stated  in  the  previous  remarks,  the  estima¬ 
tion  of  object  velocity  requires  the  knowledge  of  the  constant 
rotation  matrix  R*  £  R3x3  and  a  single  geometric  length 
si  £  R3  on  the  object.  Then,  utilizing  (8),  (10),  (27)  and 
(34),  the  estimates  for  fh*,  denoted  by  m*(t)€  R3,  can  be 
obtained  as  follows 

m*(t)  =  1  A~xp*  (50) 

[ftW], 

where  the  term  in  the  denominator  denotes  the  first  element 
of  the  vector  9 fit).  Similarly,  the  estimates  for  the  time 
varying  Euclidean  position  of  the  feature  points  on  the  object 
relative  to  the  camera  frame,  denoted  by  mfit)£  R3,  can  be 
calculated  as  follows 

Si  fit)  =  - ji - =r~A~1pi(t). 

a  fit)  9  fit) 


(51) 


Figure  2:  Estimated  parameters  for  the  second  feature 
point  on  the  simulated  moving  object. 


6  Simulation  Results 

The  adaptive  estimation  algorithm  described  in  Section  5 
was  built  on  top  of  an  existing  simulator  that  was  previously 
utilized  to  verify  the  performance  of  the  veclocity  estimator 
of  Section  4  and  described  in  detail  in  [5] .  We  selected  a  pla¬ 
nar  object  with  four  feature  points  initially  2  meters  away 
along  the  axis  of  the  camera  as  the  body  undergoing  motion. 
The  velocity  of  the  object  along  each  of  the  six  degrees  of 
freedom  was  set  to  0.2sin(£).  The  coordinates  of  the  ob¬ 
ject  feature  points  in  the  object’s  coordinate  frame  T  were 
arbitrarily  chosen  to  be  the  following 


Si  = 

[  1.0 

0.5  0.1  }T 

S2  = 

[  1-2 

-0.75 

0.1  ]T 

S3  = 

[  0.0 

-1.0 

0.1  }T 

«4  = 

[  -1.0 

0.5 

0.1  }T. 

The  object’s  reference  orientation  R*  relative  to  the  camera 
were  selected  as  diag(  1,— 1,— 1).  The  simulator  operated  at 
the  sampling  frequency  of  1  kHz.  The  estimator  gain  /3i 
was  set  to  20  for  all  i  feature  points.  It  was  observed  that 
the  estimates  9i(t)  converged  to  the  correct  values  within 
a  span  of  few  seconds.  As  an  example,  Figure  2  depicts 
the  convergence  of  62  (t)  which  can  be  used  to  compute  the 
Euclidean  coordinates  of  the  second  feature  point  S2  defined 
relative  to  the  object  frame  T  and  the  constant  z%.  defined 
relative  to  the  fixed  camera  frame  T  as  shown  in  (27). 

7  Conclusions 

This  paper  presented  an  adaptive  nonlinear  estimator  to 
identify  the  Euclidean  coordinates  of  feature  points  on  an 
object  under  motion  using  a  single  camera.  The  only  re¬ 
quirements  on  the  object  are  that  its  velocity  and  first  two 
time  derivatives  must  be  bounded,  the  orientation  of  the 
object  at  reference  position  relative  to  the  camera,  and  the 


Euclidean  coordinates  of  a  single  feature  point  relative  to  its 
coordinate  frame  must  be  known.  Lyapunov-based  system 
analysis  methods  and  homography-based  vision  techniques 
were  used  in  the  development  of  this  alternative  approach  to 
the  classical  problem  of  estimating  structure  from  motion. 
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Appendix  A:  Calculation  of  Homography  Figure  3:  Virtual  parallax. 


Homography  from  Coplanar  Feature  Points 

In  this  section,  we  present  a  method  to  estimate  the 
collineation  G(t)  by  solving  a  set  of  linear  equations  (11)  ob¬ 
tained  from  at  least  four  corresponding  feature  points  that 
are  coplanar  but  non-collinear.  Based  on  the  arguments  in 
[12],  a  transformation  is  applied  to  the  projective  coordi¬ 
nates  of  the  corresponding  feature  points  to  improve  the 
accuracy  in  the  estimation  of  G(t).  The  transformation  ma¬ 
trices,  denoted  by  P(t),  P*  G  R3x3,  are  defined  in  terms 
of  the  projective  coordinates  of  three  of  the  coplanar  non- 
collinear  feature  points  as  follows 

P  =[  Pl  P2  P3  |  P*  =  [  pt  p*2  P3  }  ■  (53) 

From  (11)  and  (53),  it  is  easy  to  show  that 

PG  =  GP*  (54) 

where 

G  =  P~1GP* 

=  diag  (oq\  «2  \  <*3  9 
=  diag(gi,  g2,  §3)  ■  (55) 

In  (55),  diag(.)  denotes  a  diagonal  matrix  with  arguments 
as  the  diagonal  entries.  Utilizing  (55),  the  relationship  in 
(11)  can  now  be  expressed  in  terms  of  G(t)  as  follows 

Qi  =  o-i  Gq*  (56) 

where 

qi  =  P~1Pi  (57) 

qt  =  P*^pt  (58) 

define  the  new  transformed  projective  coordinates.  Note 
that  the  transformation  normalizes  the  projective  coor¬ 
dinates,  and  it  is  easy  to  show  that  [  qi  92  53  ]  = 
[  qt  q2  gl  ]  =  I3  G  R3xS  where  I3  is  the  3  x 


3  identity  matrix.  Given  the  transformed  image  co¬ 
ordinates  of  a  fourth  matching  pair  of  feature  points 
q4 (t)  =  [  q4u(t)  q4v(t)  q4w{t)  ] T  G  R3  and  ql  = 

[  qlu  qtv  ql-w  ]T  G  R3,  we  have 

g4  =  04(5(24,  (59) 


where  it  can  be  shown  that 


Q4u 

Q4w  *  Ot  3 
Z  0.4  u 

Qlw  «i 

Q.4v 

0.4w  *  Q!3 

*  0.4v 

qAw  OL2 

Q_4w 

0:4 

q2w 

CK3 

(60) 


The  above  set  of  equations  can  be  solved  for 


a4  (f) 
03(f) 


and 


03 (f)G(f)  =  diag  (  — 7-r ,  — ^7-7 , 1  j  .  Since  the  camera  in- 

\«1W  «2  (f)  J 

trinsic  calibration  matrix  is  assumed  to  be  known,  we  can 
obtain  the  scaled  Euclidean  homography  which  was  defined 
in  (7)  as  03 =  a.3{t)A~1G{t)A.  As  noted  before, 
H(t)  can  be  decomposed  into  its  constituent  rotation  ma¬ 
trix  R(t),  unit  normal  vector  n* ,  scaled  translation  vector 
X  f  (t") 

Xh(t)  —  J  and  the  depth  ratio  03(f).  With  the  knowl¬ 


edge  of  03(f)  and  ,  we  can  calculate  the  depth  ratios 

03(f) 

oi,  02,  03,  and  o4  for  all  feature  points. 


Virtual  Parallax  Method 


In  general,  all  feature  points  of  interest  on  the  moving  ob¬ 
ject  may  not  lie  on  a  plane.  In  such  a  case,  based  on  the 
development  in  [18],  any  three  feature  points  on  the  object 
may  be  selected  to  define  the  plane  n*  shown  in  Figure  3. 
All  feature  points  Oi  on  a  plane  satisfy  (11).  Consider  a  fea¬ 
ture  point  Oj  on  the  object  that  is  not  on  the  plane  n* .  Let 
us  define  a  virtual  feature  point  O'  ,  on  n* ,  defined  at  the 
point  of  intersection  of  the  vector  from  the  optical  center  of 


the  camera  to  Oj  and  the  plane  7 r*.  Let  p*j  be  the  projective 
image  coordinates  of  the  point  Oj  (and  O')  on  the  image 
plane  when  the  object  is  at  the  reference  position  denoted 
by  T* .  As  shown  in  Figure  3,  when  the  object  is  viewed 
from  a  different  pose,  resulting  from  either  a  motion  of  the 
object  or  a  motion  of  the  camera,  the  actual  feature  point 
Oj  and  the  virtual  feature  point  O)  projects  to  Pj(t)  and 
p'j{t),  respectively,  on  the  image  plane  of  the  camera.  For 
any  feature  point  Oj ,  both  pj  ( t )  and  p'  ( t )  line  on  the  same 
epipolar  line  lj  [18]  that  is  given  by 


The  signal  Xh  ( t )  is  directly  obtained  from  the  decomposition 
of  Euclidean  homography  matrix  H(t).  Hence,  the  depth  ra¬ 
tios  for  feature  points  Oj  not  lying  on  the  plane  7r*  can  be 
computed  as  follows 


II Mx  mi\\ 

||  [xh\xRm* 


(69) 


Appendix  B:  Extension  to  Camera-in-Hand 


h  »  pj  x  pj  (61) 

where  x  denotes  the  cross  product  of  the  two  vectors.  Since 
the  projective  image  coordinates  of  corresponding  coplanar 
feature  points  satisfy  (11),  we  have 

lj  =  pj  x  Gpj .  (62) 

Based  on  the  constraint  that  all  epipolar  lines  meet  at  the 
epipole  [18],  we  can  select  a  set  of  any  three  non-coplanar 
feature  points  such  that  the  epipolar  lines  satisfy  the  follow¬ 
ing  constraint 


A  practically  significant  extension  to  the  fixed  camera  sys¬ 
tem  is  the  case  where  the  camera  can  move  relative  to  the 
object.  For  example,  as  shown  in  Figure  4,  a  camera  could 
be  mounted  on  the  end-effector  of  a  robot  and  used  to  scan 
an  object  in  its  workspace  to  determine  its  structure,  as  well 
as  determine  the  robot’s  position.  Let  three  feature  points 
on  the  object,  denoted  by  0\,0i  and  O3  define  the  plane 
7 r*  in  Figure  4.  Based  on  the  development  in  [3],  the  signal 
e(t)  £  R6  defined  previously  in  (12)  and  (15)  now  quantifies 
the  motion  of  the  camera  relative  to  its  reference  position. 
The  time  derivative  of  e(t)  can  be  expressed  as  follows 


|  lj  h  h  |  =  0  (63) 
Le.,|  pj  x  Gp*  pk  x  Gp*k  pi  x  Gp*  |  =  0.  (64) 

The  transformation  matrices,  denoted  by  P{t),  P*  £  R3x3, 
and  defined  in  (53),  are  constructed  using  the  image  coordi¬ 
nates  of  the  three  coplanar  feature  points  selected  to  define 
the  plane  n* .  After  coordinate  transformations  defined  in 
(57)  and  (58),  the  epipolar  constraint  of  (64)  now  becomes 

|  qt  X  Gq*  qj  X  Gq*  qk  X  Gql  \  =  0  (65) 

where  G(t)  £  R3x3  is  defined  in  (55).  As  shown  in  [18],  the 
set  of  homogeneous  equations  in  (65)  can  be  written  in  the 
form 

CjklX  =  0  (66) 

where  X  =  [gjg2,  gig2,  glg3,  sf<?3,  gigl,  hgl,  51^2^3] T  £ 
R' ,  and  the  matrix  Cjki  £  Rmx7  is  of  dimension  mx  7  where 
m  =  and  n  is  the  number  of  epipolar  lines,  one  for 

image  coordinates  of  each  feature  point.  Hence,  apart  from 
three  coplanar  feature  points  that  define  the  transformation 
matrices  in  (53),  we  will  require  atleast  five  additional  fea¬ 
ture  points  (i.  e.,  n  =  5)  in  order  to  solve  the  set  of  equations 
(66).  As  shown  in  [18],  we  can  then  calculate  G(t)  and  sub¬ 
sequently  Qi,  0:2,  03,  R  and  n*  as  previously  explained. 

Calculation  of  Depth  Ratios  for  Non-coplanar  Fea¬ 
ture  Points:  From  (4)  and  (8),  it  can  be  easily  shown  that 

mj  =  %  (i*  +^mi)  •  (67) 

After  multiplying  both  sides  of  the  equation  with  the  skew- 
symmetric  form  of  Xh{t),  denoted  by  [*h(t)]x  £  R3x3,  we 
have  [18] 

[xh]xmj  =  aj  ^ [ih] x  ^  +  [xh] x  Rm* 

=  aj[xh]xRm*.  (68) 


e  =  Jcv 


where  Jc(t)  £  R6x6  is  given  by 


Jc  = 


Ai  Aei[mi]x 

zi 

O3  —Lcj 


(70) 


(71) 


where  O3  £  R3x3  is  a  zero  matrix,  L„(t)  £  R3x3  has  exactly 
the  same  form  as  for  the  fixed  camera  case  in  (17),  and 
v(t)  =  [  uj  u!c  ]  £  R6  now  denotes  the  velocity  of  the 

camera  expressed  relative  to  I. 


With  the  exception  of  the  term  z\  £  R,  all  other  terms  in 
(71)  are  either  measurable  or  known  a  priori.  If  the  camera 
can  be  moved  away  from  its  reference  position  by  a  known 
translation  vector  Xfk  £  R3,  then  z\  can  be  computed  of¬ 
fline.  Decomposition  of  the  Euclidean  homography  between 
the  normalized  Euclidean  coordinates  of  the  feature  points 
obtained  at  the  reference  position,  and  at  Xfk  away  from  the 
reference  position,  respectively,  can  yield  the  scaled  trans¬ 
lation  vector  £  R3,  where  d*  £  R  is  the  distance  from 
a* 

the  initial  camera  position,  denoted  by  X* ,  to  the  plane  7r*. 
Then,  it  can  be  seen  that1 


d*  _  d* 
n*Tm*  n*T  A~1p\ 


(72) 


From  (70)  and  (71),  we  can  show  that  for  any  feature  point 
Oi 

Pei  =  -^AeiVc  +  Aei[mi]xUJc 
Zi 

=  WuVedi  +  W2iOJc  (73) 

where  pei{t)  £  R3  was  defined  previously  in  (23),  and 
Wu{.)  £  R3x3,  W2i(t)  £  R3x3  and  9i  £  R  are  given  as 

1  Note  that  for  any  feature  point  Oi  coplanar  with  7r*,  z*  could 
be  computed  this  way. 


From  (81),  (78)  and  (79),  it  can  be  shown  that 


Figure  4:  Geometric  relationships  for  the  camera- in-hand 
case. 

follows 

Wu  =  -onAei  (74) 

W2i  =  Aei[mi]x  (75) 

=  i-  (76) 

zi 

The  matrices  Wu{.)  and  W2i(t)  are  measurable  and 

bounded.  The  objective  is  to  identify  the  unknown  con¬ 
stant  9i  in  (76).  With  the  knowledge  of  z *  for  all  feature 
points  on  the  object,  their  Euclidean  coordinates  fh*  rela¬ 
tive  to  the  camera  reference  position,  denoted  by  I*,  can  be 
computed  from  (??)  through  (10).  As  before,  we  define  the 
mismatch  between  the  actual  signal  9i  and  estimated  signal 
9i(t )  as  9(t )  £  R  where  9{t)  =  9  —  9{t). 

To  facilitate  our  objective,  we  introduce  a  measurable  fil¬ 
ter  signal  Ci(t)  £  R3,  and  two  non-measurable  filter  signals 
Ki  ( t ) ,  r\i  (t)  £  R3  defined  as  follows 


Ci  = 

-PiCi  +  WliVc 

(77) 

Ki  = 

- PiKi  +  Wuvc9i 

(78) 

Vi  = 

- PiVi  +  w2iujc 

(79) 

where  /3i  £  R  is  a  scalar  positive  gain,  vc{t),uic{i)  £  R3are 
the  estimates  for  the  translational  and  rotational  velocity, 
respectively,  obtained  from  the  velocity  observer  in  Section 
4,  fic(t)  =  vc(t)  —vc(t)  is  the  mismatch  in  estimated  transla¬ 
tional  velocity,  and  ujc(t)  =  uic(t )  —  u ;c(t)  is  the  mismatch  in 
estimated  rotational  velocity.  Note  that  the  structure  of  the 
velocity  observer  and  the  proof  of  its  convergence  is  exactly 
identical  to  the  fixed  camera  case.  Likewise,  we  design  the 
following  estimator  for  the  pei(t) 

Pei=  PiPei  +  Ci  0i  +WuVc6i  +  W2iUJc.  (80) 
From  (73)  and  (80),  we  have 

Pei=  PiPei  ~  Ci  +WnVc9i  +  WliVc9i  +  W2iUlc ■  (81) 


Pei  =  Ci@i  +  Ki  +  rji-  (82) 

Based  on  the  subsequent  analysis,  we  select  the  following 

least-squares  update  law  for  9i(t) 

9i=  LiCjpei  (83) 

where  Lift)  £  R  is  an  estimation  gain  that  is  computed  as 

follows 

jt{L~1)=gCi  (84) 

and  initialized  such  that  Lf1(0)  >  0. 


Theorem  2  The  update  law  defined  in  (83)  ensures  that 
9i  — >  0  as  t  — >  oo  provided  that  the  following  persistent 
excitation  condition  [21]  holds 

ntQ+T 

7s  <  /  Ci  ( r)Ci{r)dr  <  76  (85) 

Jt  0 

and  provided  that  the  gains  (3i  satisfy  the  following  inequal¬ 
ities 


Pi  >  kzi  +  ka  HVFiill^,  (86) 

Pi  >  kzi  +  kei\\W2i\\200  (87) 

where  to, 75, 76,  T,  k3i,  ku,  ka,  ka  £  R  are  positive  constants, 
the  notation  H-H^  denotes  the  induced  co-norm  of  a  matrix 
[15],  and  k3i,ka  are  selected  such  that 


1  1  <  1 

ksi  ka  2 


(88) 


Proof:  Similar  to  the  analysis  for  the  fixed  camera  case, 
a  non-negative  function  denoted  by  V(t)  £  R  is  defined  as 
follows 

v±±eTLT1ei  +  ±gKi  +  ±n[Vi.  (89) 

After  taking  the  time  derivative  of  (89),  the  following  ex¬ 
pression  can  be  obtained 

V  =  \\Cif  ~  Pd^2  -  PM2 

-Of  cl  Ki  -  o!  cl  Vi  +  kJ  WliVcdi 

+vl  W2i£jc  (90) 

where  (83),  (84),  (78),  (79)  and  (82)  were  utilized.  Upon 
further  mathematical  manipulation  of  (90),  we  have, 

v  -  -ilW-M2 

-(Pi-kat-kt i  HIUhIIL)  l|«if 
-(^-ka-ket  \\W2i\\2J  \\Vi\\2 
+  ||^i||  ||Cill  ||  K»||  -  fei  ||  Ki  ||  2 
+  ll^ill  Pell  WWuW^  || Ki ||  -  k4i  \\WuWl,  H^ll2 
+  IlCill  1^1  Pill  -ksihif 
+  11*4  WWaW^hiW-kaWWaWlWv, 


2 


-  IICJ|2 

-{Pi-ksi-kiiWWuWDWKif 

-(fr-ksi-kti  \\W2iWDWruf 

+  T —  || ||  2  lltJcf  +  T —  ll^cll2  (91) 

rC4i  KQ  i 

where  k3i,k4i,ksi,kei  £  R  are  positive  constants  as  previ¬ 
ously  mentioned.  The  gain  constants  are  selected  to  ensure 
that 

\~~kr~ki  -  fl3i>0  (92) 


Pi  -  k3i  -  ku  \\Wu 
Pi  —  k$i  —  kei  \\W2i 


>  /x4i  >  0 

>  Msi  >  0 


where  fj,3i,  p4i,  pt5i  £  R  are  positive  constants.  The  gain 
conditions  given  by  (92),  (93)  and  (94)  allow  us  to  further 
upper  bound  the  time  derivative  of  (89)  as  follows 

V  <  -M3»|^i|  IlCif  -  k-4i  ll«i||2  -  Msi  ll»?il|2 

1  la  |2  II  ~  II 2  .  1  11  -  11 2 

+T7.\ei\ 

rv  Al  fv  Ql 

<  —Ms*  |  |  Kif  ~  »4i\\Kif  ~  V5i\\Vif 

+k-6i  Pll2  (95) 

where  /r6i  =  max  |  1  £  R.  Following  the  argument 

in  fixed  camera  case,  v(t)  £  L2,  hence 


[  Me*  ll®(T)||2dr  <  e 

Jto 


where  e  £  R  is  a  positive  constant.  From  (89),  (95)  and  (96), 
we  can  conclude  that 


L 


0i(T)  IICi(T 


+  k-4 i  l|Ki(r)||2  +  n5i  \\Vi{T)\\2)  dr 
<  P(0)  -  V{oo)  +  e.  (97) 

From  (97),  it  is  clear  that  C  € 

Z/2-  Applying  the  same  signal  chasing  arguments  as  in  the 
fixed  camera  case,  it  can  be  shown  that  6h(f),  Ki(t),  pfif)  £ 

Loo-  It  can  also  be  shown  that  9i  £  Loo  and 

therefore  ffjfi i(t)9i(t)  £  Loo-  Hence  C,i(t)9i{t)  is  uniformly 
continuous  [9],  and  since  (i(t)9i(t)  £  Loo,  we  have  [9] 

(i(t)9i(t)  — >  0  as  t  — >  00.  (98) 

Applying  the  same  argument  as  in  the  fixed  camera  case, 
convergence  of  9  fit)  to  true  parameters  is  guaranteed,  that 
is,  Ofit)  — >■  0  as  t  — >  00,  if  the  signal  (fit)  satisfies  the  per¬ 
sistent  excitation  condition  in  (85).  □ 


Remark  9  If  z(  can  be  computed  offline  as  described  pre¬ 
viously,  then,  unlike  the  fixed  camera  case,  the  knowledge 
of  si  is  not  required.  Also,  terms  from  the  rotation  matrix 
R(t)  are  not  present  in  ( 71 ),  and  therefore  the  estimate  of 
the  velocity  of  the  camera,  denoted  by  v(t),  can  be  computed 
without  the  knowledge  of  the  constant  rotation  matrix  R* . 
Note,  however,  that  R(t)  is  required  for  the  computation  of 
rotational  kinematics. 


Remark  10  As  mentioned  in  Section  ??,  decomposition  of 
the  Euclidean  homography  gives  the  rotation  matrix  R,  the 

x  f  r t ) 

normal  vector  n*  and  the  scaled  translation  vector  -A-f — . 

d* 

Since  d*  can  now  be  computed  from  any  feature  point  on  the 
plane  n*  using  (6),  this  allows  us  to  compute  Xf{t).  Hence 
the  6  degree- of- freedom  position  of  the  moving  camera  rela¬ 
tive  to  its  reference  position  can  be  computed  online. 


Appendix  C 


For  the  sake  of  clarity  in  the  subsequent  analysis,  let 
Ulfito,t)  £  R4X4  be  defined  as  follows 


tU{to,t)=  f  Wji{T)Wfi{T)dT 
Jtn 


where  Wfi(t)  £  R3x4  was  previously  defined  in  (29).  Con¬ 
sider  the  following  expression 

f*  ar ,  m  u  ^dOfir) 

/  0%  ( T)Qi(t0,T ) — - — dr 

Jto  dT 

=  Of  (T)Q.i(to,T)9i(T)\ 

\tQ 

~  J  fff  (Qi  {r)Lli(to,r)j  9i(r)dT 

=  Of  (t)Clfit0,t)Ofit)  -  f  Of  ( r)tli(to ,  t)  dr 
Jt0  dr 

-  f  dj {T)Wfi{T)Wfi(T)di{T)dT  (101) 

Jto 

where  we  utilized  (100)  and  the  fact  that  S2(fo,to)  =  0.  After 
re-arranging  (101),  we  have  the  following  expression 

df  (t)Clfito,t)0fit) 

=  2  f  of  (r)ni(to,  t)  d^f'T')  dr 

Jt0  dT 

+  f  of  {T)Wfi{T)Wfi(T)di(T)dT.  (102) 

Jto 

To  facilitate  further  analysis,  we  now  state  the  following 
lemma  [15] 


Remark  8  Utilizing  (77),  (10)  and  the  update  law  in  (83),  Lemma  3  Let  f(t)  be  a  uniformly  continuous  function  [9]. 

the  estimates  for  Euclidean  coordinates  of  all  i  feature  points  Then  lim  /(f)  =  0  if  and  only  if 

on  the  object  relative  to  the  camera  at  reference  position, 

denoted  by  ffi)(t)€  R3,  can  be  determined  as  follows  H+d 

lim  /  f(r)dr  =  0  (103) 

1  1  t—>  OO  I. 

m(t)  =  jf-A-'ph  (99) 

Ofit)  for  any  positive  constant  t'  £  R. 


After  substituting  for  t  =  to  +  T  in  (102),  where  T  £  Ris 
a  positive  constant,  and  applying  the  limit  on  both  sides  of 
the  equation,  we  have 

lim  9i  ( to  +  T)Qi(to,to  +  T)9i(to  +  T) 

to— too 

=  lim  (2  [t0+T  of  {T)ni(t0,T)^±dT 

to^oo  \  JtQ  dr 

+  J*°+T  ef  {^wj^WfiiTyeii^dT^j  .  (104) 

We  now  examine  the  terms  in  the  first  integral  of  (104). 
From  the  proof  of  Theorem  1,  9i(t)  £  Loo,  and  from 
(36)  and  (100),  Qi(to,to  +  T)  £  L It  was  also  proved 
that  Wfi{t)9i(t),r]At)  £  L^  nL2.  Hence,  from  (33),  lim 

t—t  OO 

Pei(f)  =  0  and  consequently  from  (28)  and  (34),  lim  0,  ( t )  = 

t—t  OO 

0.  Hence,  after  utilizing  Lemma  3,  the  first  integral  in  (104) 
vanishes  upon  evaluation.  From  (47)  and  Lemma  3,  the  sec¬ 
ond  integral  in  (104)  vanishes  as  well.  Therefore,  we  have 

lim  9 1  (to  +  T)fL(to5 to  +  T)9i(to  +  T )  =  0.  (105) 

to  — 1 00 

Since  Qi(to,t)  >  71-I4  from  (36)  for  any  to,  we  can  conclude 
from  (105)  that 

9i(t)  — >  0  as  t  — >  00.  (106) 


